Digitalisierung historischer Zeitungen aus dem Blickwinkel der automatisierten Text- und Strukturerkennung (OCR)
Identifieur interne : 000581 ( Main/Exploration ); précédent : 000580; suivant : 000582Digitalisierung historischer Zeitungen aus dem Blickwinkel der automatisierten Text- und Strukturerkennung (OCR)
Auteurs : Günter Mühlberger [Autriche]Source :
- Zeitschrift für Bibliothekswesen und Bibliographie [ 0044-2380 ] ; 2011.
Descripteurs français
- Pascal (Inist)
- Wicri :
- topic : Numérisation.
English descriptors
Abstract
OCR recognition is a key technology which cannot be circumvented when systematically digitizing historical newspapers. Although often achieving a word accuracy of only 80% or less for newspapers of the 19th and early 20th century, these imperfect files nevertheless provide a basis for a number of interesting applications - from full-text searching to indexing by search engines and online correction by users. However, in comparison to traditional digitization projects, the use of OCR requires a fundamental change of thinking during the project planning, the design of the workflow, the implementation of quality control, and in the designing of long-term preservation and presentation of digitized material on the Internet.
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream PascalFrancis, to step Corpus: 000146
- to stream PascalFrancis, to step Corpus: 000157
- to stream PascalFrancis, to step Curation: 000627
- to stream PascalFrancis, to step Checkpoint: 000135
- to stream Main, to step Merge: 000587
- to stream Main, to step Curation: 000581
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="GER" level="a">Digitalisierung historischer Zeitungen aus dem Blickwinkel der automatisierten Text- und Strukturerkennung (OCR)</title>
<author><name sortKey="Muhlberger, Gunter" sort="Muhlberger, Gunter" uniqKey="Muhlberger G" first="Günter" last="Mühlberger">Günter Mühlberger</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Universitäts- und Landesbibliothek Tirol, Abteilung für Digitalisierung und elektronische Archivierung, Innrain 52,</s1>
<s2>6020 Innsbruck</s2>
<s3>AUT</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Autriche</country>
<wicri:noRegion>6020 Innsbruck</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">11-0198412</idno>
<date when="2011">2011</date>
<idno type="stanalyst">PASCAL 11-0198412 INIST</idno>
<idno type="RBID">Pascal:11-0198412</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000146</idno>
<idno type="stanalyst">FRANCIS 11-0198412 INIST</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000157</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000627</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000135</idno>
<idno type="wicri:doubleKey">0044-2380:2011:Muhlberger G:digitalisierung:historischer:zeitungen</idno>
<idno type="wicri:Area/Main/Merge">000587</idno>
<idno type="wicri:Area/Main/Curation">000581</idno>
<idno type="wicri:Area/Main/Exploration">000581</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="GER" level="a">Digitalisierung historischer Zeitungen aus dem Blickwinkel der automatisierten Text- und Strukturerkennung (OCR)</title>
<author><name sortKey="Muhlberger, Gunter" sort="Muhlberger, Gunter" uniqKey="Muhlberger G" first="Günter" last="Mühlberger">Günter Mühlberger</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Universitäts- und Landesbibliothek Tirol, Abteilung für Digitalisierung und elektronische Archivierung, Innrain 52,</s1>
<s2>6020 Innsbruck</s2>
<s3>AUT</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Autriche</country>
<wicri:noRegion>6020 Innsbruck</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Zeitschrift für Bibliothekswesen und Bibliographie</title>
<title level="j" type="abbreviated">Z. Bibliothekswes. Bibliogr.</title>
<idno type="ISSN">0044-2380</idno>
<imprint><date when="2011">2011</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Zeitschrift für Bibliothekswesen und Bibliographie</title>
<title level="j" type="abbreviated">Z. Bibliothekswes. Bibliogr.</title>
<idno type="ISSN">0044-2380</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Digitizing</term>
<term>Information communication technology</term>
<term>Optical character recognition</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Technologie information communication</term>
<term>Reconnaissance optique caractère</term>
<term>Numérisation</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr"><term>Numérisation</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">OCR recognition is a key technology which cannot be circumvented when systematically digitizing historical newspapers. Although often achieving a word accuracy of only 80% or less for newspapers of the 19th and early 20th century, these imperfect files nevertheless provide a basis for a number of interesting applications - from full-text searching to indexing by search engines and online correction by users. However, in comparison to traditional digitization projects, the use of OCR requires a fundamental change of thinking during the project planning, the design of the workflow, the implementation of quality control, and in the designing of long-term preservation and presentation of digitized material on the Internet.</div>
</front>
</TEI>
<affiliations><list><country><li>Autriche</li>
</country>
</list>
<tree><country name="Autriche"><noRegion><name sortKey="Muhlberger, Gunter" sort="Muhlberger, Gunter" uniqKey="Muhlberger G" first="Günter" last="Mühlberger">Günter Mühlberger</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000581 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000581 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= Pascal:11-0198412 |texte= Digitalisierung historischer Zeitungen aus dem Blickwinkel der automatisierten Text- und Strukturerkennung (OCR) }}
This area was generated with Dilib version V0.6.32. |